Jonathan Tang
Real Estate Market Analysis
Data Mining/Scraping
Python
We created a Python script which takes text files of property listings from the RE/MAX website and ranks them based on how well they fit our predefined criteria.
We gathered 15 text files from RE/MAX, using Jonathan's hometown Monterey Park, CA as a search example.
Above are the criteria that we are searching for in the properties, which can be edited to our clients' liking.
Here's a code snippet from the python script:
Using our Python script, we sorted out the top 15 properties in order of how many of our criteria they met.
824 S Garfield Ave and 135 W Newmark Ave Apt. A met all 4 criteria, scoring 4 points.
We wrangled a .json dataset from data.ct.gov containing all public real estate sales in the state of Connecticut from 2001-2020.
The dataset is 286 MB in file size, with over 997,000 listings.
Metadata:
Example listing:
Using a Python script, we calculated the median sale price, sales ratio, and number of sales in 5-year intervals in Connecticut.
Here's a code snippet from the python script:
And here is the output:
For full insights and recommendations, see our report at this link:
https://colab.research.google.com/drive/1POQchp4YoYtba_NeKvmjXo5mqAZ9of4P?usp=sharing
We wrote a Python script which used BeautifulSoup to scrape the first 10 pages of property listings in Connecticut. We wanted to analyze the current property prices and compare it to the historical prices we found from the data.ct.gov .json dataset.
We stored the scraped data into a .csv file for local access. Here's how some of the data looks like:
df = pd.read_csv('realtor_connecticut.csv')
df.head(5)
Here is a code snippet from our Python script:
Using the data, we calculated the median and mean price of current listings.
Here are some plots from the aggregated data:
bins = list(range(0, 1000001, 100000))
sns.set_style("darkgrid")
sns.histplot(df, x="Price", bins=bins, binrange=[0, 1000000], color="purple").set(
title="Prices of Connecticut Homes"
)
plt.ticklabel_format(style="plain", axis="x")
sns.boxplot(data=df[["Bed", "Bath"]], palette="flare", orient="h", width=0.3).set(
title="Distribution of Beds and Baths in Connecticut Homes"
)
plt.xlim(0, 7.5)
city_order = df.groupby("City")["Price"].mean().sort_values(ascending=False).index
sns.barplot(
data=df, x="City", y="Price", order=city_order[:10], errorbar=None, palette="flare"
)
plt.xlabel("City")
plt.ylabel("Average Price")
plt.title("Most Expensive Cities in Connecticut (Average Price)")
plt.xticks(rotation=45)
plt.ticklabel_format(style="plain", axis="y")
plt.show()
For full insights and recommendations, see our report at this link:
https://colab.research.google.com/drive/1ymcqtzDyNP6t2T0wbVGt1WBO5a0-ae0s?usp=sharing
Team 7
Jonathan Tang